-
Notifications
You must be signed in to change notification settings - Fork 61
Enable -fno-reciprocal-math to fix div accuracy. #2413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses a division accuracy issue in SYCL kernels where dividing a number by itself could return non-1 results. The fix adds the -fno-reciprocal-math compiler flag to disable reciprocal math optimizations that can introduce numerical inaccuracies.
Key Changes:
- Added
-fno-reciprocal-mathtoSYCL_KERNEL_OPTIONSto prevent compiler optimizations that replace division operations with multiplication by reciprocals
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Currently evaluating the use of a more localized approach using |
| # gcc -shared host.o kernel.o device-code.o -o libxxx.so | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-sycl-unnamed-lambda) | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -sycl-std=2020) | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-reciprocal-math) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's study the NVCC behaviors first. I think reciprocal-math is a general optimization, and most the compilers enable it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EikanWang I believe --prec-div https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=reciprocal#prec-div-true-false-prec-div is the corresponding NVCC flag by default set to True unless --use_fast_math (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=reciprocal#use-fast-math-use-fast-math) is set to True. I do not believe pytorch is built with --use_fast_math correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information. Some HW platforms like CPU turn on this option by default. For CUDA, it keeps its default, saying use_fast_math is False. So, it makes sense to add no-reciprocal-math. I'm curious why SYCL turns on this option by default.
@fengyuan14 do you know the background?
torch-xpu-ops/cmake/BuildFlags.cmake
Line 77 in 20baf08
| # The fast-math will be enabled by default in SYCL compiler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my understanding, we should disable fast-math to align with CUDA
|
There are a few options regarding |
@SanityRemnants , Although the performance is the king, I'd prefer to keep a high bar of maintenance when we have no concrete performance impact.
|
There is no such issue on CUDA. This parameter is neither in IPEX or in PyTorch. We are waiting for this PR to fix related issue. Thank you! |
Added -fno-reciprocal-math to SYCL_KERNEL_OPTIONS to eliminate issues with division of the number by itself returning non 1 output which caused: #1895.